Thai Paragraph Shortening Based on Binary Classification Model
نویسندگان
چکیده
Thai sentences can be simplified or shortened by simply cutting some words out without changing its meaning. In this paper, Linear and non-linear Fisher discriminant analysis are applied to shorten Thai paragraph in a corpus. Features used in this paper are unique word ID and part of speech of the target word, as well as its three previous and three next adjacent words, and also its role as content/function word. Two scenarios are investigated namely global model and document-specific model. The results demonstrated that both Fisher discriminant analysis and kernel Fisher discriminant analysis significantly improved classification accuracy over the baseline for both scenarios. We found that, part of speech of the target word is the most relevant feature followed by part of speech of adjacent words. Moreover, the document-specific model achieved higher accuracy than the global model. This could be an evidence that author’s writing style plays an important role in paragraph shortening task.
منابع مشابه
A new classification method based on pairwise SVM for facial age estimation
This paper presents a practical algorithm for facial age estimation from frontal face image. Facial age estimation generally comprises two key steps including age image representation and age estimation. The anthropometric model used in this study includes computation of eighteen craniofacial ratios and a new accurate skin wrinkles analysis in the first step and a pairwise binary support vector...
متن کاملBinary Paragraph Vectors
Recently Le & Mikolov described two log-linear models, called Paragraph Vector, that can be used to learn state-ofthe-art distributed representations of documents. Inspired by this work, we present Binary Paragraph Vector models: simple neural networks that learn short binary codes for fast information retrieval. We show that binary paragraph vectors outperform autoencoder-based binary codes, d...
متن کاملA High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کاملA Hidden Conditional Random Field-Based Approach for Thai Tone Classification
In Thai, tonal information is a crucial component for identifying the lexical meaning of a word. Consequently, Thai tone classification can obviously improve performance of Thai speech recognition system. In this article, we therefore reported our study of Thai tone classification. Based on our investigation, most of Thai tone classification studies relied on statistical machine learning approa...
متن کاملThai News Text Summarization and Its Application
Since Thai language lacks word/phrase/sentence boundaries, document summarization in Thai needs investigations in unit segmentation, unit selection, redundancy removal and evaluation dataset construction. In this work, we have proposed Thai Elementary Discourse Unit (TEDU) and a three-stage method of Thai multidocument summarization, i.e., unit segmentation, unit-graph formulation, and unit sel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012